The New Science of Sentencing

Criminal sentencing has long been based on the present crime and, sometimes, the defendant’s past criminal record. In Pennsylvania, judges could soon consider a new dimension: the future.

Pennsylvania is on the verge of becoming one of the first states in the country to base criminal sentences not only on what crimes people have been convicted of, but also on whether they are deemed likely to commit additional crimes. As early as next year, judges there could receive statistically derived tools known as risk assessments to help them decide how much prison time — if any — to assign.

This story was produced in collaboration with FiveThirtyEight. Sign up for the FiveThirtyEight newsletter or follow FiveThirtyEight on Facebook or Twitter.

Risk assessments have existed in various forms for a century, but over the past two decades, they have spread through the American justice system, driven by advances in social science. The tools try to predict recidivism — repeat offending or breaking the rules of probation or parole — using statistical probabilities based on factors such as age, employment history and prior criminal record. They are now used at some stage of the criminal justice process in nearly every state. Many court systems use the tools to guide decisions about which prisoners to release on parole, for example, and risk assessments are becoming increasingly popular as a way to help set bail for inmates awaiting trial.

But Pennsylvania is about to take a step most states have until now resisted for adult defendants: using risk assessment in sentencing itself. A state commission is putting the finishing touches on a plan that, if implemented as expected, could allow some offenders considered low risk to get shorter prison sentences than they would otherwise or avoid incarceration entirely. Those deemed high risk could spend more time behind bars.

Pennsylvania, which already uses risk assessment in other phases of its criminal justice system, is considering the approach in sentencing because it is struggling with an unwieldy and expensive corrections system. Pennsylvania has roughly 50,000 people in state custody, 2,000 more than it has permanent beds for. Thousands more are in local jails, and hundreds of thousands are on probation or parole. The state spends $2 billion a year on its corrections system — more than 7 percent of the total state budget, up from less than 2 percent 30 years ago. Yet recidivism rates remain high: 1 in 3 inmates is arrested again or reincarcerated within a year of being released.

States across the country are facing similar problems — Pennsylvania’s incarceration rate is almost exactly the national average — and many policymakers see risk assessment as an attractive solution. Moreover, the approach has bipartisan appeal: Among some conservatives, risk assessment appeals to the desire to spend tax dollars on locking up only those criminals who are truly dangerous to society. And some liberals hope a data-driven justice system will be less punitive overall and correct for the personal, often subconscious biases of police, judges and probation officers. In theory, using risk assessment tools could lead to both less incarceration and less crime.

There are more than 60 risk assessment tools in use across the U.S., and they vary widely. But in their simplest form, they are questionnaires — typically filled out by a jail staff member, probation officer or psychologist — that assign points to offenders based on anything from demographic factors to family background to criminal history. The resulting scores are based on statistical probabilities derived from previous offenders’ behavior. A low score designates an offender as “low risk” and could result in lower bail, less prison time or less restrictive probation or parole terms; a high score can lead to tougher sentences or tighter monitoring.

The risk assessment trend is controversial. Critics have raised numerous questions: Is it fair to make decisions in an individual case based on what similar offenders have done in the past? Is it acceptable to use characteristics that might be associated with race or socioeconomic status, such as the criminal record of a person’s parents? And even if states can resolve such philosophical questions, there are also practical ones: What to do about unreliable data? Which of the many available tools — some of them licensed by for-profit companies — should policymakers choose?

Even some supporters of risk assessment in bail and parole worry that using the tools for sentencing carries echoes of “Minority Report”: locking people up for crimes they might commit in the future. In a speech to the National Association of Criminal Defense Lawyers last August, then-Attorney General Eric Holder said risk assessment tools can be useful in directing offenders toward rehabilitative programs, allowing them to shorten their prison sentences. But he criticized the use of such tools at the sentencing phase. “By basing sentencing decisions on static factors and immutable characteristics — like the defendant’s education level, socioeconomic background, or neighborhood — they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society,” he said.

Milton Fosque remembers it as common and acceptable to drink and drive back when he started doing it, in the 1970s. He was in the Army then, and alcohol was part of his routine. “We drank because we were men,” he said. “That’s the way it was in the Army. You did your duty for the day, and then you went straight to the bar.”

Fosque, now 58, lives in Philadelphia, the city where he was born and raised. A heavyset man with the neatly shorn head of a serviceman, he says he quit drinking a few times over the years, but never for long. That changed in 2012, when he was arrested for the third time in four years for driving under the influence. Pennsylvania takes a tiered approach to DUIs; Fosque received a combined 90 days of jail and one year of probation as a result of the first two arrests. For the third, state law dictates one to five years in jail. The judge sentenced Fosque to a year behind bars and five years of probation.

In 2010, between Fosque’s first and second arrests, Pennsylvania legislators passed a law with a number of reforms intended to deflate the state’s ballooning prison system. It included changes to parole and treatment programs, as well as a provision to reduce the number of people in prison for technical parole violations. Also included in that law was a mandate that the state create a risk assessment for sentencing to use at an unprecedented level — in nearly every state courtroom, for nearly every type of crime (the exception will be a limited group of minor offenses and misdemeanors). Once it goes into effect, the tool will help determine the sentences of thousands of people like Fosque every year.

The decision of what to do with that mandate was given to the state Commission on Sentencing, and after years of research, the commission’s work is nearing completion. Although final recommendations won’t be ready until the beginning of 2016 at the earliest, a series of reports lay out what the tool should look like and how the information will be presented to judges.

Fosque wasn’t sentenced using this risk assessment, but his case can illustrate how Pennsylvania’s proposed tool is supposed to work. The gravity of the crime determines what questions are asked and how many points the answers are worth, although the severity of the crime doesn’t factor into the final score.

In just about any risk assessment, prior criminal activity is considered the most predictive measure, and in the Pennsylvania tool, prior arrests can be worth several points. Fosque has been arrested numerous times in his life, so he would get four points. He’s male, which is worth another point, and lives in an urban county, one more point. Those qualities combined give him a starting score of 6 out of a possible 13, putting him in the range of moderate risk. Along with the sentencing guidelines, a judge would see a chart showing that people who fit this description have a 49 percent recidivism rate.

Fosque, however, says the chance he will commit another crime is zero. After a year in jail, he’s now out on parole and says he has been sober since he was last arrested, in 2012. He was elected to the board of the re-entry program he attended, is active at his church, and has been working on lifelong family issues with the help of a social worker. He’s even fixing up his home. “I’m not going back there,” he said of his time inside.

Fosque is quick to talk about drinking and the life choices that landed him in jail. But he also feels he owns the responsibility and effort it has taken to stay sober. He hadn’t heard of risk assessment, but after he was told that the tools were used to determine which facility he served time in and what level of supervision he received on parole, he looked them up online. “You mean to tell me they’re using statistics to determine what’s going to happen to me?” he asked. “That ain’t right.”

Fosque’s objection underscores one of the central questions in the risk assessment debate: Is it fair to look at the behavior of a group when deciding the fate of an individual? Statistics, after all, can’t say whether Fosque will commit another crime, and he believes he’s doing everything possible to avoid further run-ins with the law.

Sonja Starr, a University of Michigan law professor who has been a leading opponent of risk assessment, says it isn’t fair. “These instruments aren’t about getting judges to individually analyze life circumstances of a defendant and their particular risk,” she said. “It’s entirely based on statistical generalizations.”

Supporters of the tools counter that judges, parole boards and other decision-makers already make their own risk assessments, whether or not they call them that. The difference is that people aren’t as good as statistics at predicting who is most likely to commit crimes in the future. In the 1960s, before the current burst of research on risk, one common misconception among correctional experts was that people with mental illness were more likely to be repeat, violent offenders. They aren’t, research shows. Formal risk assessments offer greater transparency and, according to numerous studies, greater accuracy than the ad hoc systems they are replacing. Yet in most cases, the tools’ recommendations are only advisory. Judges can — and do — choose to disregard their suggestions for many reasons, including because they prefer exercising professional discretion or because they feel the tool fails to account for an important aspect of the defendant or his or her crime.

Using a questionnaire “doesn’t guarantee a probation officer won’t give a kid a higher risk score because he thinks the kid wears his pants too low,” said Adam Gelb, director of the public safety performance project at the Pew Charitable Trusts. But, he said, risk assessment creates a record of how officials are making decisions. “A supervisor can question, ‘Why are we recommending that this kid with a minor record get locked up?’ Anything that’s on paper is more transparent than the system we had in the past. In many cases, you had no idea from probation officer to probation officer, let alone from judge to judge, what was in people’s heads. There was no transparency, and decisions could be based on just about any bias or prejudice.”

The developers of Pennsylvania’s tool have largely avoided the underlying philosophical questions raised by risk assessment. Mark Bergstrom, the Sentencing Commission’s executive director, says it’s not up to him whether the state should use a risk assessment in sentencing, since the legislature has already voted to do so. His job is to figure out what it will look like and how it will be implemented.

At their most basic, risk assessment tools are all built in essentially the same way: Social scientists look at a large population of former prisoners, examine hundreds of facts about their lives, and then follow the individuals over several years to see which traits are associated with further criminal activity. Criminologists have identified various factors that appear linked to continued criminal activity, such as feeling proud of breaking the law or having marital or substance abuse problems. But from a raw statistical standpoint, three factors are far and away the most predictive: sex, age and prior criminal history.

There is little question that well-designed risk assessment tools “work,” in that they predict behavior better than unaided expert opinion. Over the past several decades, dozens of social scientific studies have been published comparing professional predictions of risk to predictions made by statistics. When implemented correctly, whether in the fields of medicine, finance or criminal justice, statistical actuarial tools are accurate at predicting human behavior — about 10 percent more accurate than experts assessing without the assistance of such a tool, according to a 2000 paper by a team of psychologists at the University of Minnesota.

But to critics, just because a trait predicts crime doesn’t mean it’s fair to use it in sentencing decisions. Pennsylvania’s proposed tool will take into account factors like sex and age that are beyond an individual’s control. It will also include a question on where offenders live and, in some cases, penalize residents of urban areas, who are far more likely to be black.

Perhaps most controversially, the Sentencing Commission’s draft assessment tool will factor in an individual’s history of arrests, not just convictions. Even using convictions is potentially problematic; blacks are more likely than whites to be convicted of marijuana possession, for example, even though they use the drug at rates equivalent to whites. But arrests are even more racially skewed than convictions, and public defender groups in Pennsylvania think their use to determine sentencing may be unconstitutional.

Sample questions

●○○

Based on these answers

194,523,620
Americans ages 18-64 would give these same answers, or 100% of the adult population. They ...

... are as likely to be black as U.S. adults overall.

... are as likely to be Hispanic as U.S. adults overall.

... earn as much as the median U.S. household.

Bradley Bridge, an attorney with the Defender Association of Philadelphia, points to differences in policing around the state, which he says can have a dramatic effect on arrests. Heavy policing in some neighborhoods in Philadelphia makes low-income and nonwhite residents more likely to be arrested, whether or not they’ve committed more or worse crimes.

“This is a compounding problem,” Bridge said. “Once they’ve been arrested once, they are more likely to be arrested a second or a third time — not because they’ve necessarily done anything more than anyone else has, but because they’ve been arrested once or twice beforehand.”

Even many people who defend risk assessment in theory say it can be problematic in practice. Official records can contain mistakes. Tools intended for one purpose can be used for another. Many tools include questions that are subjective, requiring that the person filling out the questionnaire characterize the offender’s feelings and attitudes. That process can introduce error.

A probation officer in Ohio said he regularly deals with the practical challenges surrounding risk assessment in the community-based correctional facility where he works. The facility houses felons who are given one last chance to straighten out — if they reoffend, they can be sent to state prison. Residents are sentenced to four- to six-month stays and receive counseling, addiction treatment, and educational and vocational training.

Research shows there are benefits to letting low-risk offenders avoid jail or prison time. When they are incarcerated alongside high-risk offenders, the likelihood that they will break the law again increases. At the Ohio facility, each resident is housed in either a “high risk” or “moderate risk” dormitory, depending on the score he received on a risk assessment, typically administered during a pretrial interview. Yet some judges sentence defendants with low risk scores to the facility. “If the judge wants to send somebody here, they will say, ‘I don’t give a damn, they’re going,’ ” said the probation officer, who asked to remain anonymous because he was not authorized to speak to a journalist. “It could be a first-time felony offender with no criminal history — the definition of low risk. We’ll put the low-risk [residents] in with the moderates and try to help out as much as possible.” (The Ohio Department of Rehabilitation and Correction said its administrative rules allow offenders with low risk scores to be sentenced to this type of facility, but only if they have serious substance abuse problems.)

After completing the facility’s programming, residents are released and typically serve two to three additional years of probation. The probation officer visits clients at home, gives them drug tests, and counsels them on finding employment and improving their personal relationships. After a few months, the officer typically uses another risk assessment tool to evaluate a probationer’s progress and to determine whether they can meet less frequently. But since his facility adopted the approach in 2011, the officer has noticed that probationers have become savvier about the interview and more calculating in their answers.

“I don’t think they’re all lying, but these guys have figured out the importance of these [assessments] and what can happen as a result,” the officer said. The Ohio risk assessment system recommends spending 45 to 60 minutes on each interview of this type, but the officer says that understaffing at his facility and a 160-person caseload mean he spends only 15 to 20 minutes on each interview. The hardest risk factors to assess, he said, are those related to the subject’s attitudes: whether he feels pride in his criminal behavior, is willing to walk away from a fight, or follows the Golden Rule.

He said he listens for statements that could indicate a dangerous attitude problem — for example, “You gotta do what you gotta do” and “I gotta look out for me.” But “to do one of these accurately and really dig deep to make sure you’re getting good answers takes more time than we have,” he said.

Pennsylvania’s Sentencing Commission is trying to avoid some of these challenges by designing its risk assessment tool to use only information that comes from databases, not interviews. Prosecutors and defense lawyers can see all the information that goes into the scoring and will have an opportunity to verify its accuracy and to ask for changes if something is incorrect.

But even when they are properly administered, many of the most widely used tools are blunt instruments. A tool used in Pennsylvania’s Corrections Department, for example, asks if an inmate has ever had a drug or alcohol problem, with no distinction based on severity. Such tools often make no effort to assess how different variables interact: Does drug use matter more among younger people? Does education matter less if someone is employed? And many jurisdictions use tools that weren’t explicitly designed for — and in some cases haven’t been fully tested on — local populations.

Richard Berk, a University of Pennsylvania statistician, said the most widely used tools are “a generation behind a lot of the developments that are going on in computer science and statistics.” Berk has been at the national forefront of efforts to bring risk assessment into the modern era. He has developed assessment tools that use a more advanced statistical discipline known as machine learning. In essence, Berk feeds a huge amount of data into a computer and lets a program figure out which variables matter and how much. He argues that his approach generates predictions that are both more accurate and more finely tuned — distinguishing, for example, between violent and nonviolent crimes.

The Adult Probation and Parole Department in Philadelphia was one of the first to try Berk’s method in the real world, and its experience in many ways shows the promise of risk assessment. When the department began exploring risk assessment in the mid-2000s, its roughly 275 case officers oversaw 50,000 people — too many to manage effectively. Cases were divided among officers regardless of the seriousness of the offense, and it was left to individual officers to decide how closely to supervise their charges. Ellen Kurtz, who spent 10 years as director of research for the department before leaving in June, said the system was failing at all levels. It wasn’t fair to officers, who were given little guidance and frequently suffered burnout, or to offenders, who weren’t being treated equally. And just as problematic, the system wasn’t working; rates of recidivism, in particular violent recidivism, were high across the city. “It was all intuitive, gut-based decision-making,” Kurtz said.

Opening Statement

Berk developed a tool that sorts offenders into three categories: The highest-risk offenders are considered likely to commit a subsequent violent offense. Medium-risk offenders are equally likely to commit a new crime, but in a nonviolent way. And low-risk offenders are unlikely to break the law again. In 2009, the department not only adopted Berk’s tool, but it also completely changed its approach. Case officers were reassigned to deal exclusively with high-, medium- or low-risk offenders. Each category of offender is treated differently; high-risk offenders have to check in regularly in person, while low-risk offenders can check in less often, usually online or by phone.

The reform was controversial at the time but is now widely seen as a success. A randomized controlled trial completed in 2008 tested the new system: Offenders deemed low-risk were randomly assigned either to the new, less onerous supervision system or the stricter version previously in place. Under the new system, offenders faced far fewer drug tests and were told to report to their parole officers less than half as often. Parole officers were also able to oversee far more people. The laxer supervision didn’t lead to a meaningful increase in arrest rates; in fact, arrests on serious charges were lower under the new system, although the difference wasn’t statistically significant. Kurtz said recidivism — especially violent recidivism — has fallen in the years since.

Using risk assessment in criminal sentencing is a thornier issue. “It’s a higher-stakes decision point in terms of someone’s liberty,” Kurtz said. “It definitely makes me a little bit more uncomfortable.” There are only a handful of examples of states using risk assessment in sentencing, and even those use it in very limited ways. Florida uses the Positive Achievement Change Tool to help probation officers and judges determine outcomes for juvenile offenders, including sentencing. A 2014 state report showed that juveniles whose sentences followed state guidelines derived from the tool’s various risk levels were half as likely to reoffend within 12 months as juveniles sentenced outside the guidelines.

Virginia mostly uses risk assessment to identify the lowest-risk offenders and divert them into alternatives to incarceration. Even so, its decade-old policy has been controversial. The ACLU challenged the constitutionality of the law, arguing that basing sentences on statistical correlations, rather than the details of a specific case, “cuts to the core of the fundamental Constitutional principles of equality and fairness.”

A state appeals court dismissed the challenge on the grounds that under Virginia’s law, the risk assessment scores were only advisory — judges can and do disregard them. A 2014 analysis by the state’s Criminal Sentencing Commission found that judges disregard sentencing guidelines roughly 20 percent of the time.

The commission has released aggregate figures showing that incarceration and recidivism rates in the state have both fallen since it began using risk assessment in sentencing. But in a long fight with the Daily Press, a newspaper in southeastern Virginia, the commission has refused to release more detailed data that could reveal the policy’s impact on racial and other disparities. Rob Poggenklass, an ACLU attorney, said the lack of data makes it difficult to evaluate the true impact of the state’s risk assessment policy. “It’s sort of tricky to make any big pronouncements about whether it’s working,” he said.

Indeed, it has proved remarkably difficult to evaluate the real-world impact of risk assessment, positive or negative. As in Virginia, states have often released only limited data, and even where they have been more forthcoming, the latest generation of risk assessment tools is still too new for conclusions to be drawn about their long-term effects. Randomized experiments like the one conducted in Philadelphia’s probation system are all but impossible in sentencing and are generally rare in criminal justice. And, as in Pennsylvania, risk assessment tools are often adopted as part of larger criminal justice reforms, making it hard to isolate their effects.

The studies that have tried to overcome such challenges have shown mixed, though generally positive, results. In the juvenile justice system, the nonprofit Annie E. Casey Foundation has encouraged hundreds of jurisdictions to adopt risk assessment tools as part of a broader package of reforms intended to reduce the number of incarcerated youth. Across all its partner sites, Casey reports a 46 percent reduction in the detention of youth of color and a 44 percent reduction in the detention of white youth, although those results cannot be attributed to risk assessment alone.

In the adult system, the results have generally been more modest. Kentucky adopted a risk assessment tool developed by the nonprofit Arnold Foundation for use in its bail decisions. It has released more defendants and re-arrested fewer of them, but the change has been far from dramatic. The percentage of defendants the state released pretrial went up 2 points in the six months after the tool was introduced, Arnold reports, while the rate of new arrests for defendants awaiting trial declined to 8.5 percent from 10 percent. (The Arnold Foundation is a funder of The Marshall Project.)

Determining other impacts of risk assessment is even harder. Jennifer Skeem, a University of California, Berkeley, psychologist who has written extensively on risk assessment, said there simply isn’t enough data available to say with certainty whether it reduces racial disparities in the justice system. But she said better data alone won’t be enough to resolve the questions the tools raise.

“I’m not convinced that when we do have the evidence, that it’s going to shut down the debate” because there will still be more fundamental questions, she said.

The core questions around risk assessment aren’t about data. They are about what the goals of criminal justice reforms should be. Some supporters see reducing incarceration as the primary goal; others want to focus on reducing recidivism; still others want to eliminate racial disparities. Risk assessments have drawn widespread support in part because, as long as they remain in the realm of the theoretical, they can accomplish all those goals. But once they enter the real world, there are usually trade-offs.

Risk assessment tools can determine that a person like Milton Fosque has a 49 percent chance of committing another crime. What they can’t decide is what to do with that information. Should 49 percent be considered high risk or low? Should Fosque be in prison? On probation? In treatment? Berk, the University of Pennsylvania statistician, said those decisions have to be made by policymakers and the public, not researchers.

“I’m not trying to design interventions that turn bad guys into good guys,” Berk said. “My job is to provide, I hope, better information to inform whatever decisions are being made.”

In Pennsylvania, at least, such policy discussions have drawn little public attention despite the best efforts of the Sentencing Commission, which in addition to publishing its detailed reports has held public hearings across the state. Those hearings drew so few people that Bergstrom, the commission’s executive director, extended the public comment period through the end of the year.

Bergstrom, who has run the commission for nearly two decades, is walking a delicate line. He said he wants to create a tool that accurately predicts behavior while avoiding endless lawsuits. The commission’s research has found that prior arrests are a better predictor of recidivism than prior convictions. But using arrests would almost certainly draw a constitutional challenge from the state’s public defenders. They point to the racial disparities in arrest rates and say it’s illegal to presume someone is guilty just because he was arrested.

Based on the work the commission has done so far, Bergstrom says he’s leaning toward using the tool to identify outliers — low-risk individuals to defer from prison altogether and high-risk individuals to flag for extra time or treatment. That would be a fairly limited approach, but it wouldn’t avoid the central question of whether offenders should spend more time behind bars simply because of how statistical tools say they will behave in the future.

The New Science of Sentencing

Should prison sentences be based on crimes that haven’t been committed yet?

Risk Assessment Doesn't Eliminate Bias

Sample questions

Based on these answers

Opening Statement

Our reporting has real impact on the criminal justice system

The New Science of Sentencing

Should prison sentences be based on crimes that haven’t been committed yet?

Risk Assessment Doesn't Eliminate Bias

Sample questions

Based on these answers

Opening Statement

Our reporting has real impact on the criminal justice system

Stay up to date on our reporting and analysis.